61 research outputs found
A calculation method to estimate thermal conductivity of high entropy ceramic for thermal barrier coatings
High entropy ceramics are highly promising as next generation thermal barrier
coatings due to their unique disorder structure, which imparts ultra-low
thermal conductivity and good high temperature stability. Unlike traditional
ceramic materials, the thermal resistance in high entropy ceramics
predominantly arises from phonon-disorder scattering rather than phonon-phonon
interactions. In this study, we propose a calculation method based on the
supercell phonon unfolding (SPU) technique to predict the thermal conductivity
of high entropy ceramics, specially focusing on rocksalt oxides structures. Our
prediction method relies on using the reciprocal value of SPU phonon spectra
linewidth as an indicator of phonon lifetime. The obtained results demonstrate
a strong agreement between the predicted thermal conductivities and the
experimental measurements, validating the feasibility of our calculation
method. Furthermore, we extensively investigate and discuss the atomic
relaxation and lattice distortion effects in 5-dopants and 6-dopants rocksalt
structures during the process.Comment: 19 page, 8 figure
Forcing you to experience wonder: Unconsciously biasing people’s choice through strategic physical positioning
Magicians have developed powerful tools to covertly force a spectator to choose a specific card. We investigate the physical location force, in which four cards (from left to right: 1-2-3-4) are placed face-down on the table in a line, after which participants are asked to push out one card. The force is thought to rely on a behavioural bias in that people are more likely to choose the third card from their left. Participants felt that their choice was extremely free, yet 60% selected the 3rd card. There was no significant difference in estimates and feelings of freedom between those who chose the target card (i.e. 3rd card) and those who selected a different card, and they underestimated the actual proportion of people who selected the target card. These results illustrate that participants’ behaviour was heavily biased towards choosing the third card, but were oblivious to this bias
Improved speaker independent lip reading using speaker adaptive training and deep neural networks
Recent improvements in tracking and feature extraction mean that speaker-dependent lip-reading of continuous speech using a medium size vocabulary (around 1000 words) is realistic. However, the recognition of previously unseen speakers has been found to be a very challenging task, because of the large variation in lip-shapes across speakers and the lack of large, tracked databases of visual features, which are very expensive to produce. By adapting a technique that is established in speech recognition but has not previously been used in lip-reading, we show that error-rates for speaker-independent lip-reading can be very significantly reduced. Furthermore, we show that error-rates can be even further reduced by the additional use of Deep Neural Networks (DNN). We also find that there is no need to map phonemes to visemes for context-dependent visual speech transcription
Good me bad me : Prioritization of the Good-Self during perceptual decision-making
Peer reviewedPublisher PD
Finding phonemes: improving machine lip-reading
In machine lip-reading there is continued debate and research
around the correct classes to be used for recognition.
In this paper we use a structured approach for devising
speaker-dependent viseme classes, which enables the creation
of a set of phoneme-to-viseme maps where each has a different
quantity of visemes ranging from two to 45. Viseme classes
are based upon the mapping of articulated phonemes, which
have been confused during phoneme recognition, into viseme
groups.
Using these maps, with the LiLIR dataset, we show the
effect of changing the viseme map size in speaker-dependent
machine lip-reading, measured by word recognition correctness
and so demonstrate that word recognition with phoneme classifiers
is not just possible, but often better than word recognition
with viseme classifiers. Furthermore, there are intermediate
units between visemes and phonemes which are better still
Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed
Speechreading or lipreading is the technique of understanding and getting
phonetic features from a speaker's visual features such as movement of lips,
face, teeth and tongue. It has a wide range of multimedia applications such as
in surveillance, Internet telephony, and as an aid to a person with hearing
impairments. However, most of the work in speechreading has been limited to
text generation from silent videos. Recently, research has started venturing
into generating (audio) speech from silent video sequences but there have been
no developments thus far in dealing with divergent views and poses of a
speaker. Thus although, we have multiple camera feeds for the speech of a user,
but we have failed in using these multiple video feeds for dealing with the
different poses. To this end, this paper presents the world's first ever
multi-view speech reading and reconstruction system. This work encompasses the
boundaries of multimedia research by putting forth a model which leverages
silent video feeds from multiple cameras recording the same subject to generate
intelligent speech for a speaker. Initial results confirm the usefulness of
exploiting multiple camera views in building an efficient speech reading and
reconstruction system. It further shows the optimal placement of cameras which
would lead to the maximum intelligibility of speech. Next, it lays out various
innovative applications for the proposed system focusing on its potential
prodigious impact in not just security arena but in many other multimedia
analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul,
Republic of Kore
Resolution limits on visual speech recognition
Visual-only speech recognition is dependent upon a number of factors that can be difficult to control, such as: lighting; identity; motion; emotion and expression. But some factors, such as video resolution are controllable, so it is surprising that there is not yet a systematic study of the effect of resolution on lip-reading. Here we use a new data set, the Rosetta Raven data, to train and test recognizers so we can measure the affect of video resolution on recognition accuracy. We conclude that, contrary to common practice, resolution need not be that great for automatic lip-reading. However it is highly unlikely that automatic lip-reading can work reliably when the distance between the bottom of the lower lip and the top of the upper lip is less than four pixels at rest
Which phoneme-to-viseme maps best improve visual-only computer lip-reading?
A critical assumption of all current visual speech recognition systems is that there are visual speech units called visemes which can be mapped to units of acoustic speech, the phonemes. Despite there being a number of published maps it is infrequent to see the effectiveness of these tested, particularly on visual-only lip-reading (many works use audio-visual speech). Here we examine 120 mappings and consider if any are stable across talkers. We show a method for devising maps based on phoneme confusions from an automated lip-reading system, and we present new mappings that show improvements for individual talkers
ControlCom: Controllable Image Composition using Diffusion Model
Image composition targets at synthesizing a realistic composite image from a
pair of foreground and background images. Recently, generative composition
methods are built on large pretrained diffusion models to generate composite
images, considering their great potential in image generation. However, they
suffer from lack of controllability on foreground attributes and poor
preservation of foreground identity. To address these challenges, we propose a
controllable image composition method that unifies four tasks in one diffusion
model: image blending, image harmonization, view synthesis, and generative
composition. Meanwhile, we design a self-supervised training framework coupled
with a tailored pipeline of training data preparation. Moreover, we propose a
local enhancement module to enhance the foreground details in the diffusion
model, improving the foreground fidelity of composite images. The proposed
method is evaluated on both public benchmark and real-world data, which
demonstrates that our method can generate more faithful and controllable
composite images than existing approaches. The code and model will be available
at https://github.com/bcmi/ControlCom-Image-Composition
- …